Historic,  Archive  Document 

Do  not  assume  content  reflects  current 
scientific  knowledge,  policies,  or  practices. 


f 


! 


1 


a/^D9o£>/ 

■AJ3? 


United  States 
Department  of 
Agriculture 


'Jr 

A  Production 


National 

Agricultural 

Statistics 

Service 

Research  and 

Applications 

Division 


Forecasting 
Model  for  Corn 


NASS  Staff  Report 
Number  SRB-87  02 

December  1987 


Thomas  R.  Birkett 


A  PRODUCTION  FORECASTING  MODEL  FOR  CORN.  By 

Thomas  R.  Birkett;  National  Agricultural  Statistics  Service;  U.S. 
Department  of  Agriculture;  Washington,  D.C.  20250;  November 
1987,  Staff  Report  No.  SRB-87-02. 


ABSTRACT  This  report  introduces  a  regression  model  for  com  production  that 

uses  summarized  data  as  input  and  allows  for  interaction  between 
ear  counts  and  ear  lengths.  In  a  comparison  of  forecasting  accu¬ 
racy  covering  the  period  1980-1985,  this  model  outperformed  by  a 
wide  margin  the  current  objective  yield  models  in  August  and  Sep¬ 
tember  and  equaled  the  performance  of  the  Agricultural  Statistics 
Board  in  August  but  not  in  September.  The  method  of  generating 
estimates  with  this  model  is  entirely  objective.  It  is  recommended 
that  the  current  objective  yield  modeling  system  be  augmented  with 
this  new  model’s  procedures. 


This  report  was  prepared  for  limited  distribution  to  the 
research  community  outside  the  U.S.  Department  of  Agri¬ 
culture.  The  views  expressed  herein  are  not  necessarily 
those  of  NASS  or  USDA. 


ACKNOWLEDGMENTS  The  author  wishes  to  thank  Ben  Klugh  for  his  help  in  understand¬ 
ing  the  com  objective  yield  program  and  for  his  many  suggestions 
that  were  instrumental  in  making  this  analysis  successful.  The 
author  also  wishes  to  thank  Ron  Steele  and  Fred  Vogel  for  their 
suggestions  in  the  review  process. 


CONTENTS 


SUMMARY  .  iii 

INTRODUCTION  .  1 

THE  MODEL  .  2 

METHODS  .  6 

COMPOSITE  FORECAST  .  7 

RESULTS  .  10 

DISCUSSION  .  16 

CONCLUSION  .  17 

RECOMMENDATIONS  .  17 


BIBLIOGRAPHY 


18 


SUMMARY 


This  report  presents  a  regression  model  for  com  that  produces  a 
direct  forecast  of  com  production  for  the  10  objective  yield  States. 
The  model  is  based  upon  state  level  estimates  of  acres  for  harvest, 
plant  and  ear  counts,  and  ear  sizes.  Using  the  season  final  Agricul¬ 
tural  Statistics  Board  (ASB)  as  the  final  value,  a  comparison  of 
forecasting  accuracy  was  done.  The  ASB  performance,  the  pro¬ 
posed  new  com  forecasting  model  and  the  current  objective  yield 
models  were  compared  for  August  and  September  com  production 
forecasts.  The  comparison  showed  the  proposed  com  model  out¬ 
performed  the  current  objective  yield  models  by  a  wide  margin  in 
August  and  September  and  outperformed  the  ASB  m  August  but 
not  September. 

The  current  objective  yield  procedure  uses  separate  sample  level 
regression  models  for  final  ear  counts  and  ear  weights.  The 
approach  presented  in  this  paper  differs  from  the  current  procedure 
in  the  following  ways:  (1)  the  sample  level  data  are  summarized  to 
the  10-State  level  (State  level  for  the  State  models)  before  being 
entered  in  the  model,  and  (2),  the  ear  count  by  ear  size  interaction 
term  is  included  in  one  model,  instead  of  having  a  separate  model 
for  each  variable.  The  range  of  the  model  includes  nonnal  yielding 
years  as  well  as  the  drought  years  1980  and  1983  and  the  high 
yielding  1985.  The  method  of  generating  estimates  with  this  model 
is  entirely  objective.  A  90  percent  prediction  interval  for  August 
would  be  approximately  plus  or  minus  10  percent. 

Many  relationships  are  examined  with  the  discovery,  at  the  regional 
level,  that  two  variables  do  consistently  well  in  August  and  one 
variable  outperforms  all  others  in  September.  These  variables  are 
the  product  of  an  acreage,  count  and  size  variable.  The  model 
form  is  Y  =  bX,  so  it  is  a  no  intercept  model  and  has  only  one 
parameter,  b.  The  variables  that  do  well  in  August  are  X  -  (total 
acres)  x  (number  of  ears  with  kernels  per  acre)  x  (average  kernel 
row  length  per  ear),  and  X  =  (total  acres)  x  (number  of  stalks  with 
ears  per  acre)  x  (average  kernel  row  length  per  ear).  In  Sep¬ 
tember  the  superior  variable  is  X  =  (total  acres)  x  (number  of  ears 
and  ear  shoots  per  acre)  x  (average  kernel  row  length  per  ear). 
State  models  are  also  developed,  and  the  regional  model  is  used  to 
prorate  State  estimates  to  the  predicted  value  of  the  regional  model. 

It  is  recommended  the  proposed  forecast  model  be  used  to  augment 
the  current  NASS  objective  yield  modeling  system. 


iii 


A  Production  Forecasting  Model  for  Com 


INTRODUCTION 


Thomas  R.  Birkett 


This  paper  develops  a  regression  model  for  forecasting  com  pro¬ 
duction  that  shows  accurate  predictive  capabilities  across  a  broad 
spectrum  of  years. 

The  model  was  developed  using  the  com  objective  yield  data  for 
1980-1985.  The  development  is  analogous  to  a  technique  used  for 
forecasting  yields  of  fruit  and  nut  crops  the  author  developed  while 
working  in  the  California  State  Statistical  Office.  The  functional 
form  of  the  independent  variable  is  found  in  the  work  of  Fecso 
(1975). 

The  statistical  philosophies  behind  the  existing  model  and  the  pro¬ 
posed  model  are  summarized  below: 

Existing  objective  yield  models: 

(1)  The  models  are  built  with  sample-level  data,  so  an  individual 
sample  can  have  a  large  influence  on  the  parameter  estimates. 
These  are  straight-line  models  and  their  parameter  estimation 
procedures  can  be  unstable  in  the  presence  of  the  numerous 
extreme  values  that  exist  in  the  sample-level  data.  An 
automated  procedure  has  been  developed  to  identify  and 
remove  these  extreme  values,  restoring  some  stability.  How¬ 
ever,  by  deleting  observations,  the  properties  of  the  estimated 
parameters  are  no  longer  necessarily  optimal.  Existence  of  the 
outliers  also  provides  evidence  that  a  straight-line  model  does 
not  adequately  describe  the  underlying  relationship  between 
the  dependent  and  independent  variables  over  the  entire  meas¬ 
urement  space. 

(2)  There  are  separate  models  for  final  number  of  ears  and  ear 
weight.  Trie  lack  of  model  fit  in  each  model  may  be  com¬ 
pounded  when  the  results  are  combined. 


1 


THE  MODEL 


(3)  Interaction  between  ear  counts  and  ear  sizes  is  not  included 
until  the  predicted  values  from  the  two  models  are  combined. 

Proposed  Objective  Yield  Model: 

(1)  Unbiased  estimates  of  the  10  State  level  mean  number  of  ears 
per  square  foot  and  mean  length  per  ear  are  constructed  from 
the  sample  level  data  and  the  State  acreages.  Due  to  the  large 
sample  size,  independent  samples,  and  the  central  limit 
theorem  the  approximate  distribution  of  these  estimators  is 
known.  Any  individual  sample  can  have  only  a  small  effect 
upon  these  estimated  parameters.  In  particular  the  variance  is 
small  because  of  the  large  sample  size. 

(2)  One  model  that  includes  the  ear  count  by  ear  size  interaction 
is  used.  The  independent  variables  will  be  those  calculated  in 
(1)  and  consequently  very  stable. 

(3)  This  model  is  based  on  the  same  input  as  existing  procedures, 
but  should  much  more  closely  approximate  the  underlying 
relationship  between  the  dependent  and  independent  variables. 

The  remainder  of  this  paper  has  the  following  format.  First,  the 
basic  algebraic  formulas  for  the  possible  versions  of  the  indepen¬ 
dent  variable  X  are  presented.  The  survey  design  allows  the  con¬ 
struction  of  64  possible  combinations  of  (acreage  variable )  x 
(count  variable)  x  (size  variable).  Unbiased  estimators  for  number 
of  counts  per  acre  and  average  size  per  ear  are  defined  as  part  of 
the  construction  of  X.  Next,  in  the  Methods  section,  a  way  of 
selecting  the  best  version  of  A  is  presented,  along  with  a  method 
for  combining  forecasts  from  more  than  one  X.  The  Results  sec¬ 
tion  presents  the  data  set  and  discusses  the  performance  of  the 
superior  X’s,  the  Agricultural  Statistics  Board,  and  the  current 
objective  yield  models  in  predicting  the  final  Agricultural  Statistics 
Board  production.  The  performance  of  each  is  compared  and  con¬ 
clusions  are  drawn  based  on  these  results. 


The  model  has  the  form  Y  =  bX,  and  is  consequently  a  one  variable 
no-intercept  model.  (An  hypotheses  test  for  a  zero  intercept  has  a 
p-value  of  0.8).  When  the  variable  X  contains  acreage  information, 
Y  is  production.  When  it  does  not,  Y  represents  yield.  In  the 
development  in  this  paper,  acreage  information  is  included  as  part 
of  X,  so  Y  represents  production.  The  results  are  almost  as  good  if 
X  does  not  include  acreage  and  Y  is  yield.  When  X  does  not 


2 


include  acres,  the  predicted  yield  would  be  multiplied  by  the 
acreage  estimate  to  get  production.  The  methodology  can  be 
implemented  either  way. 

The  predictor  variable  X  for  production  is  constructed  from  vari¬ 
ables  collected  in  the  objective  yield  survey,  combined  with 
acreage  information  from  the  June  Enumerative  Survey  (JES).  The 
variable  X  is  a  function  of  all  information  from  all  the  samples  in 
all  States  for  each  month.  The  underlying  relationship  between  this 
variable  and  final  com  production  will  be  modeled  by  a  straight 
line  through  the  origin.  There  consequently  is  only  one  parameter 
to  estimate  and  the  model  can  be  fit  with  as  little  as  one  year  of 
historical  data.  The  zero  intercept  also  sharply  reduces  the  variance 
of  the  slope  parameter  estimate. 

The  JES  and  the  com  objective  yield  survey  provide  the  following 
variables:  (1)  State  acres  —  the  JES  Harvest  to  Planted  (jes/hp)  and 
the  JES  Direct  Expansion  (jes/de)\  (2)  sample  level  counts  per 
square  foot  —  number  of  stalks  (s),  number  of  stalks  with  ears  (se), 
number  of  ears  and  silked  ear  shoots  (eaes),  and  number  of  ears 
with  kernels  (ek)\  and  (3)  sample  level  ear  sizes  —  mean  kernel 
row  length  (hi)  and  mean  length  over  husk  (loh),  each  from  a  sub¬ 
sample  of  ears. 

The  count  variables  are  transformed  to  a  square  foot  basis  because 
the  area  of  each  sample  varies  (due  to  the  differing  row  widths). 
For  this  transformation,  it  is  assumed  that  the  area  of  each  sample 
is  equal  to  the  area  of  the  rectangle  formed  by  the  15-foot  measure¬ 
ment  and  the  eight  row  width  measurement  divided  by  two.  The 
two  size  variables  are  also  weighted  by  the  counts  per  square  foot 
at  the  sample  level  to  obtain  unbiased  estimates  of  the  mean  size 
per  ear.  Preliminary  analysis  led  to  limiting  the  analysis  to  samples 
with  maturity  3  or  greater.  (The  maturity  coding  system  for  the 
samples  is  1-2,  before  ears  are  present;  3-6,  ears  present;  and  7, 
harvest.  Almost  all  samples  have  maturity  greater  than  2  after 
August).  In  the  formulas  that  follow,  all  sample  counts  have  been 
converted  to  counts  per  square  foot. 

The  construction  of  the  independent  variable  in  the  model,  XJW  • 
is  a  series  of  straight  and  weighted  means.  The  dependent  variable, 
T,  is  Board  production  in  billions  of  bushels.  The  series  of  weight¬ 
ings  starts  with  the  sample-level  data  and  continues  in  stages  to  the 
10-State  regional  level.  The  sample  is  considered  random  inside  of 
each  State.  And,  because  of  the  transfonnation  of  the  count  data. 


3 


there  is  a  random  sample  of  square  feet  within  each  State.  The 
number  of  ears  per  square  foot  varies,  so  the  size  variables  are 
weighted  by  the  number  of  ears  they  represent.  The  State  acreages 
also  vary. 

Unbiased  estimates  of  the  individual  State  and  10-State  regional 
mean  counts  per  square  foot  and  mean  sizes  per  ear  are  derived 
separately.  (Estimated  values  of  these  parameters  for  1980-1985 
for  the  10-State  region  and  for  Iowa  are  plotted  in  figures  1  and  2 
of  Results.)  These  means  are  multiplied  together  at  the  State  and 
regional  levels.  In  the  following,  the  references  to  the  two  acreage 
variables,  four  count  variables,  and  two  size  variables  are  to  those 
variables  already  described  as  making  up  the  input  data.  Starting 
with  the  sample-level  data,  the  construction  of  has  the  fol¬ 

lowing  sequence. 

Let  I  denote  the  set  of  all  States  in  the  survey  in  any  given  month, 
and  let  /*  denote  the  set  of  States  in  the  survey  in  any  given  month 
that  has  at  least  one  completed  sample  with  size  data  (of  maturity  3 
or  greater).  In  the  following,  the  subscript  iel  will  always  refer  to 
the  States,  the  subscript  j  e  {1,2}  will  represent  the  acreage  vari¬ 
able,  ki  and  k2  g  {1,2, 3 ,4}  will  refer  to  the  count  variables,  and  / 
e  {1,2}  will  always  represent  the  size  variable. 


The  independent  variable  Xjk^2i  has  the  functional  form  (regional 
acreage  estimate)  x  (regional  mean  count  per  square  foot  estimate) 
x  (regional  mean  size  per  ear  estimate): 

Xjk}k2i  ~  Aj  Cjk]  Sjkik2i  (1) 

where  Aj  is  the  sum  of  the  State  acreage  estimates  for  acreage  vari¬ 
able  j  and  is  defined  as: 

Aj  =  Z  aij  (2) 

iel 

where  a ,y  =  the  acreage  for  State  /,  acreage  variable  j. 


Cjk[  is  the  weighted  average  of  the  State  mean  counts  per  square 
foot,  weighted  by  the  State  acreages,  and  is  defined  as 


Z  aii  ik . 


c  -  ,e/ 

Cjki  ~ 


2>.: 


iel' 


(3) 


where  cik]  =  the  mean  count  per  square  foot  for  count  variable  k[  in 


4 


State  i. 


1 

cikx  ~  Z  Cik]m  (4) 

ni  m=l 


where  cik]m  is  the  count  per  square  foot  for  count  variable  ky,  State 
/,  sample  m,  and  ni  is  the  sample  size  in  State  i  for  samples  with 
maturity  e  (3, 4, 5,6}. 

Sjk^kil  is  the  regional  mean  size  per  ear,  weighted  from  the  sample 
level  to  the  State  level  by  count  variable  k2 ,  and  weighted  from  the 
State  level  to  the  regional  level  by  (acreage  variable  j)  x  (count 
variable  k [). 


Z  &ij  (-tkl  $ik2l 

iel' _ 

Z  a.j  rik, 

iel’ 


(5) 


where  is  defined  above,  Tikit  is  the  mean  size  per  ear  for  size 

variable  /  in  State  i,  using  as  sample  level  weights  count  variable 
k2. 


sik2l  ~ 


I 


m=  1 


(-ik2ni  ^ ilm 


n i 

Z  ik2ni 

m=  1 


(6) 


siim  =  the  size  per  ear  for  size  variable  /,  State  /,  sample  m,  cik^ni  is 
the  count  per  square  foot  for  count  variable  k2,  State  i,  sample  m, 
Uj  =  number  of  samples  in  State  i  with  maturity  e  {3, 4,5,6}. 


5 


METHODS 


The  computer  was  used  to  calculate  the  values  of  the  64  Xjk^2i 
defined  in  (1).  Values  for  State  models  were  also  calculated. 

There  were,  again,  two  acreage  variables,  four  count  variables,  and 
four  different  weighted  versions  of  each  of  two  ear  size  variables. 
This  results  in  64  possible  variable  combinations  of  the  form 
acreage  x  count  x  size,  the  general  form  for  X. 

Modeling  all  64  combinations  is  a  systematic  way  of  maximizing 
the  information  gained  from  the  survey.  The  fact  that  the  designers 
of  the  survey  chose  to  include  certain  variables  is  sufficient  reason 
to  model  them.  After  examining  analyses  such  as  these  the 
designers  may  decide  that  certain  variables  need  no  longer  be  col¬ 
lected.  The  reason  for  modeling  all  four  weighted  versions  of  the 
two  size  variables  is,  a  priori,  it  is  possible  that  one  weightmg 
count  variable  may  outperform  the  others.  Later  in  this  paper  it 
will  be  examined  whether  some  combining  of  variables  can  take 
place,  thus  reducing  the  overall  number. 

Table  1  portrays  the  factorial  process  used  to  construct  the  64 
acreage  x  count  x  size  variables.  This  table  is  a  pictorial  represen¬ 
tation  of  the  algebraic  derivations  in  the  previous  section.  From 
each  variable  the  model  Y  =  bX  is  fit  with  b  estimated  with  least 
squares  (and  Y  is  Board  production). 

Table  1  —  Construction  of  64  possible  independent  variables  for  the  model  by 
multiplying  together  one  acreage,  one  count,  and  one  size  variable,  and  eight 
additional  independent  vanables  obtained  by  multiplying  one  acreage  and  one 
count  vanable.  Columns  are  (k2). 


Acreage 

Count 

Size  (with  sample-level  weight) 

stalks 

kernel  row  length  (stalks) 

kernel  row  lenglh  (stalks  with  ears) 

kernel  row  length  (ears  and  ear  shoots) 

jes/hp 

stalks  with  ears 

kernel  row  length  (ears  with  kernels) 

jes/de 

ears  and  ear  shoots 

length  over  husk  (stalks) 

ears  with  kernels 

length  over  husk  (stalks  with  ears) 

jes/hp 

stalks 

stalks  with  ears 

length  over  husk  (ears  and  ear  shoots) 
length  over  husk  (ears  with  kernels) 

jes/de 

ears  and  ear  shoots 

ears  with  kernels 

6 


COMPOSITE 

FORECAST 


As  an  example  X11U  is  (jes/hp)  x  (stalks)  x  (kernel  row’  length 
weighted  by  stalks  at  the  sample  level).  X2442  is  (jes/de)  x  (ears 
with  kernels)  x  (length  over  husk  weighted  by  ears  with  kernels  at 
the  sample  level).  Also  included  in  the  table  are  eight  acreage  x 
count  variables  whose  purpose  is  explained  below. 

Because  some  States  do  not  have  any  completed  samples  with 
maturity  3  or  greater  in  August,  it  is  not  always  possible  to  calcu¬ 
late  State  values  for  the  64  Xjk^:,  in  the  first  month  of  the  survey. 
This  is  because  if  there  are  no  samples  with  ears  then  there  is  no 
estimate  of  the  average  size  per  ear  (the  s 'iki,).  To  fill  this  void, 
eight  additional  variables  were  created,  corresponding  to  the  two 
acreage  variables  multiplied  by  the  four  count  variables.  Values 
for  these  eight  variables  will  always  be  available,  and  their  addition 
results  in  a  total  of  72  possible  independent  variables  to  choose 
from  for  each  month  of  the  survey  before  harvest.  Preliminary 
analysis  resulted  in  limiting  the  first  64  model  variables  to  samples 
with  maturities  of  3  or  greater.  The  last  eight,  which  lack  a  size 
component,  include  all  samples. 

In  August,  the  number  of  samples  with  size  data  is  limited,  and,  as 
stated  earlier,  some  States  won’t  have  any  samples  with  maturity  3 
or  greater.  For  these  States,  the  model  is  limited  to  the  eight 
acreage  x  count  variables.  To  calculate  the  value  of  the  64  Xjk]kl/ 
for  the  10-State  regional  model  in  August,  the  method  calculates 
the  mean  counts  and  mean  sizes  for  the  subregion  of  States  that  do 
have  size  data  (the  set  /  ),  and  then  applies  these  means  to  the  10- 
State  region.  Because  almost  all  samples  have  reached  maturity  3 
by  September,  the  variable  X  for  September  is  based  on  a  large 
amount  of  data. 


At  this  point  what  is  available  from  this  system  are  72  one-variable 
no-intercept  regression  models.  The  question  is,  what  variable 
should  be  used  to  make  the  forecast  in  the  current  year,  or  how  can 
the  forecasts  from  several  models  best  be  combined?  The  reason 
for  not  considering  a  regression  model  with  more  than  one  indepen¬ 
dent  variable  in  it  at  this  point  is  because  the  number  of  years  in 
the  model  precludes  this.  With  n  =  6  it  is  better  to  limit  the 
number  of  parameters  in  each  regression  to  one.  This  restriction 
can  be  lifted  as  more  years  of  data  become  available.  Conse¬ 
quently,  to  use  the  information  from  more  than  one  variable  we 
will  use  a  average  of  the  predicted  values  from  each  one  variable 


7 


model.  The  method  of  combining  the  forecasts  is  as  follows.  (See 
Houseman). 

From  a  preliminary  analysis  it  was  determined  that  the  combining 
of  forecasts  would  work  better  if  the  set  of  64  variables  with  a  size 
component  were  combined  to  16.  This  combining  involved  averag¬ 
ing  across  the  k2  subscript  of  each  set  of  four  Xjk^ki/  with  the  same 
j,  ki,  and  /  subscripts. 

xjt,i  =  7  Z  xjk,kj-  (8) 

4  k2=l 

k2  represents  the  sample  level  weight  on  the  size  variable.  The 
interpretation  of  this  procedure  is  that  the  sample  level  weight  on 
the  size  variables  now  becomes  the  average  of  the  four  count  vari¬ 
ables.  After  this  is  done  there  are  16  variables  with  a  size  com¬ 
ponent  and  8  variables  without  one,  for  a  total  of  24  variables.  The 
16  now  represent  the  2  acreage  by  4  count  by  2  size  factorial  gen¬ 
eration  of  all  possibilities.  The  8  represent  the  2  acreage  by  4 
count  generation  of  possibilities.  At  this  point  the  forecasting  sys¬ 
tem  has  24  one  variable  regression  models  generating  24  T’s. 

(YbY2, . ,f24)  (9) 

The  composite  model  is  the  average  of  the  forecasted  values  from  a 
subset  of  the  24  available  forecasts.  (In  this  case  of  composite  esti¬ 
mation  it  was  found  that  equal  weights  were  optimal,  once  the 
members  of  the  composite  model  were  selected).  This  subset  was 
chosen  to  be  the  subset  that  had  minimum  estimated  variance  m 
predicting  final  Board  production.  (It  was  not  possible  to  look  at 
all  possible  subsets,  so  a  forward  selection  type  procedure  was  util¬ 
ized  to  select  the  best  subset).  Stated  more  explicitly,  the  subset  of 
the  24  Y/s  whose  mean  has  minimum  estimated  variance  is  the 
composite  model. 

Y=J-'£Yl  UO) 

>~n\ 

where  (Yn  ,...,Y„k)  are  the  subset  members  whose  mean  has 
minimum  variance  over  all  possible  subsets. 

To  estimate  the  stability  and  error  rate  of  this  procedure  a  jacknife 
type  of  analysis  was  devised.  For  each  combination  of  five  years 
taken  from  the  six  years  1980-1985,  of  which  there  are  6  (combina¬ 
tions),  the  minimum  variance  criterion  identified  the  subset  of  vari¬ 
ables  to  make  up  the  composite  model.  The  mean  predicted  value 


8 


of  this  subset  was  determined,  along  with  the  difference  between  it 
and  the  actual  value  of  Y.  In  this  method  the  predicted  value  and 
the  actual  value  are  functions  of  different  years  and  therefore 
independent.  Jacknifing  is  a  way  of  guarding  against  the  possibility 
that  so  many  combinations  were  looked  at  that  a  random  good  fit 
would  occur,  even  though  it  would  not  hold  in  reality.  Detailed 
results  from  this  jacknife  appear  in  the  next  section.  The  variables 
in  the  composite  model  each  year  and  the  prediction  errors  are 
presented  so  one  can  observe  the  stability  and  accuracy  of  this  pro¬ 
cedure.  This  method  is  compared  to  the  record  of  the  Agricultural 
Statistics  Board  (ASB)  over  the  same  period. 


9 


kernel  row  length  per  ear  in  inches 


RESULTS  Observed  values  of  the  ears  with  kernels  per  square  foot  and  the 

kernel  row  length  per  ear  are  shown  in  figure  1  and  figure  2. 

Figure  1  presents  plots  for  the  10-state  region  and  figure  2  for  Iowa 
covering  1980-1985.  The  August  values  are  from  only  the  samples 
with  maturity  greater  than  or  equal  to  3  (ears  present). 


Figure  1  —  Observed  values  of  ears  with  kernels  per  square  foot  and  average  kernel  row  length  per  ear, 
1980-1985,  for  the  10  state  region.  The  symbol  represents  the  year. 


10  State  Region,  1980-1985 


o 

August 

SO 

1 

rv 

CM 

K 

LO 

5 

O 

0 

2 

r< 

7.0 

_ i _ 

4 

6.8 
_ i _ 

<o 

to 

to 

3 

6.4 

_ i _ 

6.0 

CM 

1 

i  i  i 

1  to 

September 

1 


0.30  0.35  0.40  0.45  0.50  0.40  0.42 

ears  with  kernels  per  square  foot 


0.44 


0.46 


0.48  0.50 


These  plots  illustrate  why  the  model  works.  High  yielding  years 
appear  in  the  upper  right  hand  comer,  because  they  have  both  large 
sizes  and  high  counts.  The  lowest  yielding  year  is  in  the  lower  left 
hand  comer,  where  the  years  with  low  counts  and  low  sizes  are 
located.  Also  the  general  orientation  does  not  change  when  going 
from  August  to  September,  which  supports  the  validity  of  the  early 
August  size  data,  as  it  is  modeled  here. 

The  observed  data  indicates  that  in  August  1983  the  suivey  picked 
up  early  indications  of  low  counts  and  very  small  sizes.  1983  was 
a  drought  year  in  the  Midwest.  The  survey  also  picked  up  early 
signs  of  low  counts  in  1980,  another  year  with  below  average 


10 


kernel  row  length  per  ear  in  inches 


rainfall. 


1981  had  normal  counts  and  very  large  sizes.  1982  and  1984  are 
characterized  by  normal  sizes  and  high  numbers  of  ears.  As  men¬ 
tioned,  1980  had  very  low  numbers  and  normal  sizes,  while  1983 
had  the  lowest  yields  of  the  period  with  somewhat  below  normal 
counts  and  small  ears. 

In  August  1985  the  counts  of  ears  with  kernels  were  the  highest  of 
the  period,  and  the  ear  sizes  were  above  average.  In  September  of 
that  year,  the  counts  remained  at  record  high  levels,  but  the  sizes 
were  in  the  average  range.  Heavy  rains  fell  in  the  com  belt 
between  the  August  and  September  surveys  that  year,  slowing 
down  the  growth  in  ear  size. 


Figure  2  —  Observed  values  of  ears  with  kernels  per  square  foot  and  average  kernel  row  length  per  ear, 
1980-1985,  for  Iowa.  The  symbol  represents  the  year. 

Iowa,  1980-1985 


o 

00 


August 


CO 


in 

r-4 


o 

r4 


0 


5 


4 


o 


in 

CD 


3 


00 

CO 


CO 

id 


September 

1 


2 


5  4 


cb  i - 1 - 1 - 1 - 1 - 1 - 1  co  i - 1 - r - 1 - 1 - 1 - 1 - 1 - 1 

0.25  0.30  0.35  0.40  0.45  0.50  0.55  0.43  0.45  0.47  0.49  0.51 

ears  with  kernels  per  square  foot 


The  pattern  for  Iowa  follows  the  regional  pattern.  In  1982  in  Iowa 
there  were  no  samples  with  maturity  3  in  August. 


11 


Figure  3  presents  the  plot  for  the  best  single  variable  August 
model,  ears  with  kernels  x  kernel  row  length  (ek  x  lai).  Notice  the 
relationship  is  very  linear  through  the  origin.  Later  it  will  be 
shown  that  this  variable  in  conjunction  with  one  other  variable 
form  the  best  composite  model  for  August. 


Figure  3  —  August  model  with  ears  with  kernels  x  kernel  row  length  ( ek  x  rl ).  The 
symbol  represents  the  year. 


10  State  Region,  1980-1985 
August 


5 


M 

<D 

.C 

0) 

D 

£3 


a. 

■o 

CO 

o 

CD 


4 


0 


3 


I - 1 - 1 - 1 - 1 - ~T - 1 - 1 - 1 

100  120  140  160  180  200  220  240  260 

ears  with  kernels  *  kernel  row  length 
(scaled) 


12 


Figure  4  exhibits  a  plot  of  the  best  September  variable,  ears  and 
ear  shoots  x  kernel  row'  length  ( eaes  x  krl).  The  variation  around 
the  fitted  line  is  small,  with  the  largest  residual  at  1985. 


Figure  4  —  The  September  model  with  ears  and  ear  shoots  x  kernel  row  length  ( eaes 
x  krl).  The  symbol  represents  the  year. 

10  State  Region,  1980-1985 
September 


5 


r» — 


V) 

Q) 

-C 

w 

3 

_Q 

C 

o 

Jo 

c 

c 

o 

T3 

D 

■o 

o 

k_ 

CL 

■o 

o 

m 


to— 


1 


2 


4 


0 


3 


H - 1 - 1 - 1 - 1 - 1 - 1 - 1 

700  800  900  1000  1100  1200  1300  1400 

ears  and  ear  shoots  *  kernel  row  length 
(scaled) 


13 


Table  2  exhibits  the  performance  of  the  composite  model. 


Table  2  —  10  State  jacknife  results,  1980-85. 


Month 

Year 

Variables 

in 

Composite 
Model  1/ 

Model 

Predicted 

Value 

billion  bus 

Board 

Predicted 

Value 

icls 

Board 

Final 

Model 

Error 

9 

Board 

Error 

August 

80 

ek  x  krl 

se  x  krl 

5.74 

5.60 

5.61 

2.5 

0.2 

81 

ek  x  krl 

se  x  krl 

6.40 

6.40 

6.74 

4.3 

5.0 

82 

ek  x  krl 

se  x  krl 

6.48 

6.96 

6.85 

5.4 

1.6 

83 

ek  x  krl 

se  x  krl 

3.68 

4.27 

3.34 

10.2 

27.8 

84 

ek  x  krl 

se  x  krl 

6.24 

6.32 

6.26 

0.3 

1.0 

85 

ek  x  krl 

se  x  krl 

7.51 

6.84 

7.38 

1.8 

7.3 

September 

80 

eaes  x  krl 

5.58 

5.54 

5.61 

0.5 

1.2 

81 

eaes  x  krl 

6.77 

6.59 

6.74 

0.4 

2.2 

82 

eaes  x  krl 

7.04 

6.96 

6.85 

2.8 

1.6 

83 

eaes  x  krl 

3.75 

3.53 

3.34 

12.3 

5.7 

84 

eaes  x  krl 

6.45 

6.17 

6.26 

3.0 

1.4 

85 

eaes  x  krl 

6.81 

7.02 

7.38 

7.7 

4.9 

1/  Variable  abbreviations  ^-stalks,  J'C-stalks  with  ears,  eaes-eais  and  ear  shoots,  ek-e ars 
with  kernels,  b'l-kemel  row  length,  loll— length  over  husk. 


In  August  the  number  of  variables  in  the  composite  is  two,  ears 
with  kernels  x  kernel  row  length  (ek  x  hi),  and  stalks  with  ears  x 
kernel  row  length  ( se  x  hi). 

In  September  the  number  of  variables  in  the  composite  model  is 
one,  ears  and  ear  shoots  by  kernel  row  length  {eaes  x  hi). 

Since  the  August  and  September  models  are  constant,  there  is  evi¬ 
dence  that  the  system  contains  an  inherent  stability. 


14 


This  system  will  now  be  compared  to  the  Agricultural  Statistics 
Board  (ASB)  and  the  current  objective  yield  models. 


Table  3  —  Average  prediction  errors  of  this  objective  yield  model,  the 
ASB,  and  the  current  objective  yield  models,  based  on  1980-85. 


Month 

Slate 

Average 

percent  prediction  error 

Model 

ASB 

Current  OY 
Models 

August 

Illinois 

10.3 

10.3 

20.1 

Indiana 

10.0 

9.0 

16.5 

Iowa 

7.6 

9.4 

12.6 

Michigan 

8.2 

5.8 

6.3 

Minnesota 

7.4 

9.3 

9.9 

Missouri 

16.5 

16.8 

37.6 

Nebraska 

7.5 

7.4 

11.9 

Ohio 

13.5 

12.5 

14.5 

South  Dakota 

22.2 

9.9 

21.5 

Wisconsin 

2.2 

5.6 

6.1 

State  wghtd.  avg.  1/ 

9.3 

9.4 

11.4 

September 

Illinois 

7.1 

2.6 

9.2 

Indiana 

6.6 

2.3 

11.5 

Iowa 

8.7 

3.0 

7.9 

Michigan 

8.2 

3.6 

4.5 

Minnesota 

7.0 

6.9 

8.1 

Missouri 

14.0 

8.3 

17.8 

Nebraska 

4.6 

5.9 

10.8 

Ohio 

8.4 

7.3 

14.1 

South  Dakota 

12.5 

12.6 

15.5 

Wisconsin 

1.1 

3.0 

3.2 

State  wghtd.  avg.  1/ 

7.3 

4.4 

8.0 

1/  The  state  errors  are  weighted  by  state  production. 


Table  3  presents  the  results  for  the  composite  model  developed  in 
the  last  section.  Forecasts  were  made  for  the  10  state  region  and 
for  each  state  individually.  In  August  each  state  model  except  for 
Missouri  was  number  of  stalks  (s).  The  Missouri  model  was  ears 
and  ear  shoots  {eaes).  None  of  the  August  State  models  utilized 
size  data.  In  September  six  state  models  consisted  of  ears  and  ear 
shoots  x  kernel  row  length  ( eaes  x  krl),  and  the  other  four  were 
ears  with  kernels  x  length  over  husk  ( ek  x  loh).  In  each  month  the 
state  forecasts  were  then  prorated  to  sum  to  the  predicted  value 
from  the  10  state  model.  The  objective  yield  indication  is  adjusted 
for  bias  according  to  the  current  Board  charting  system. 

In  August  the  model  was  equal  with  the  ASB  (9.3  to  9.4  percent 
average  error),  and  18  percent  more  accurate  than  the  objective 


15 


yield  models  currently  used  (9.3  to  11.4). 


DISCUSSION 


In  September  the  ASB  improved  considerably  to  an  average  error 
of  only  4.4  percent,  while  the  model  improved  to  7.3  percent  aver¬ 
age  error.  The  model  outperformed  the  current  objective  yield 
models  by  9  percent  (7.3  to  8  percent). 

An  examination  of  the  errors  in  Table  2  reveals  that  August  1983 
was  the  year  where  this  model  significantly  outperfonned  the 
ASB.  In  that  drought  year  the  model  recognized  the  small  ear  sizes 
and  low  counts  before  the  Board.  In  more  normal  yielding  years 
the  errors  of  the  Board  were  on  average  smaller. 

The  two  primary  criticisms  of  this  approach  are  that  it  uses  a  lim¬ 
ited  number  of  years  and  that  it  does  not  outperform  the  ASB 
(although  they  are  on  average  equal  in  August).  These  concerns 
will  now  be  addressed. 

The  philosophy  behind  limiting  the  number  of  years  is  based  on 
making  the  model  adaptable  to  changing  conditions.  The  correct¬ 
ness  of  the  model  specification  depends  on  the  relationship  being  a 
straight  line  through  the  origin.  Nonnally  this  relationship  cannot 
be  expected  to  hold  over  long  series  of  years,  due  to  changes  in 
underlying  conditions  that  induce  a  time  trend.  The  mechanism  for 
adjusting  for  this  time  trend  is  to  delete  years  from  the  beginning 
of  the  series  until  the  nonlinearity  is  removed.  For  this  data  set  the 
relationship  is  linear  beginning  m  1980.  The  model  is  deliberately 
specified  to  have  only  one  slope  parameter  and  a  small  variance 
around  the  fitted  line,  so  that  very  small  n  is  sufficient.  Also  each 
observation  represents  an  entire  survey  of  data,  so  the  amount  of 
information  going  into  the  model  is  equal  to  the  current  models.  It 
could  be  said  that  this  model  is  able  to  use  even  more  infonnation 
because  it  doesn’t  require  that  outliers  in  the  raw  data  be  modified 
of  deleted. 

The  second  criticism  is  that  the  model  does  not  do  as  well  as  the 
ASB  in  September.  One  aspect  that  is  disadvantageous  to  the 
model’s  historical  performance  is  that  this  performance  is  measured 
at  the  State  level.  The  Board  uses  a  balance  sheet  approach  to 
force  the  state  sums  to  a  prespecified  total,  and  then  allocating  pro¬ 
cedures  are  subjective.  This  can  cause  some  problems  for  the  state 
level  models,  which  remain  objective  functions  of  acreage,  counts 
and  sizes.  Also,  the  measure  of  forecasting  accuracy  for  the  Sep¬ 
tember  ASB  depends  on  the  final  ASB,  and  the  two  may  not  be 


16 


entirely  independent. 


CONCLUSION 


RECOMMENDATIONS 


Based  on  these  results  this  approach  is  a  better  way  to  model  the 
objective  yield  data  than  that  currently  used.  This  paper  has 
presented  the  exact  form  of  the  model  and  how  to  form  the  esti¬ 
mates  of  counts  per  unit  area  and  average  sizes.  These  estimates 
are  seen  to  be  unbiased,  and  the  form  of  the  model  satisfies  under¬ 
lying  regression  assumptions  very  well.  Confidence  intervals  on 
predicted  values  for  production  are  an  immediate  consequence  of 
this  formulation.  The  work  of  Warren  and  Cook  on  developing  an 
early  season  estimate  of  final  ear  weight  using  weather  data  also 
fits  into  this  framework.  The  current  estimate  of  size  per  ear  is 
interchangeable  with  their  estimator.  The  carefully  defined  structure 
of  the  independent  variable  as  being  the  product  of  three  separately 
estimated  components  allows  this  flexibility.  Because  of  these 
characteristics  and  others  already  presented,  this  approach  can  be 
said  to  be  well  documented  with  superior  results  to  the  current 
objective  yield  modeling  system. 

Also  as  more  years  become  available  we  may  be  able  to  add 
another  variable  to  the  regression  (use  a  two-variable  no-intercept 
model)  and  not  be  tied  to  composite  estimation. 


I  recommend  that  the  current  objective  yield  modeling  system  for 
com  be  augmented  with  this  simple,  direct,  and  competitive  model¬ 
ing  system. 


17 


BIBLIOGRAPHY 


[1]  Allen,  D.M.,  "Mean  Square  Error  of  Prediction  as  a  Criterion 
for  Selecting  Variables,"  Technometrics,  13,  1971,  pp.  469-475. 

[2]  DeGroot,  Morris  H.,  Probability  and  Statistics,  Don  Mills, 
Ontario:  Addison  Wesley,  1975. 

[3]  Fecso,  Ronald  S,,  "A  Study  of  Walnut  Production  Forecasting," 
Staff  Report,  California  Crop  and  Livestock  Reporting  Service, 
1975. 

[4]  Houseman,  Earl  E.,  "Composite  Estimation,"  Staff  Report,  U.S. 
Department  of  Agriculture,  Statistical  Reporting  Service. 

[5]  Rao,  C.  Radhakrishna,  Linear  Statistical  Inference  and  Its 
Applications,  New  York:  John  Wiley  and  Sons,  1973. 


18 


r 


» 


I 


